Using statistical language modelling to identify new vocabulary in a grammar-based speech recognition system

نویسنده

  • Genevieve Gorrell
چکیده

Spoken language recognition meets with difficulties when an unknown word is encountered. In addition to the new word being unrecognisable, its presence impacts on recognition performance on the surrounding words. The possibility is explored here of using a back-off statistical recogniser to allow recognition of out-of-vocabulary words in a grammar-based speech recognition system. This study shows that a statistical language model created from a corpus obtained using a grammar-based system and augmented with minimally-constrained domainappropriate material allows extraction of words that are out of the vocabulary of the grammar in an unseen corpus with fairly high precision.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Statistical Language Modelling to Identify New Vocabulary in a Grammar-Based Speech Recognition System

Spoken language recognition meets with difficulties when an unknown word is encountered. In addition to the new word being unrecognisable, its presence impacts on recognition performance on the surrounding words. The possibility is explored here of using a back-off statistical recogniser to allow recognition of out-of-vocabulary words in a grammar-based speech recognition system. This study sho...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

The SpeeD Grammar-based ASR System for the Romanian Language

This paper describes the grammar-based automatic speech recognition system for the Romanian language developed by the Speech and Dialogue Research Group. The paper links to previous work for the issues related to large vocabulary speech recognition and focuses on the specific optimization work done for several closed-vocabulary, grammar-based speech recognition tasks. Among the specific problem...

متن کامل

Subword lexical modelling for speech recognition

In this work, we introduce and develop a novel framework, angie, for modelling subword lexical phenomena in speech recognition. Our framework provides a exible and powerful mechanism for capturing morphology, syllabi cation, phonology, and other subword e ects in a hierarchical manner which maximizes sharing of subword structures. Angie models the subword structure within a context-free grammar...

متن کامل

A Unified Framework for Sublexical and Linguistic Modelling Supporting Flexible Vocabulary Speech Understanding1

In [9], we introduced the ANGIE framework for modelling speech where morphological and phonological substructures of words are jointly characterized by a context-free grammar and represented in a multi-layered hierarchical structure. In [6], we demonstrated a competitive word-spotter based on the ANGIE framework and presented several results comparing the performance of various sublexical fille...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003